Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race conditions in libcontainerd process handling #35809

Merged
merged 2 commits into from Dec 16, 2017

Conversation

cpuguy83
Copy link
Member

@cpuguy83 cpuguy83 commented Dec 15, 2017

  • Fix some missing synchronization in libcontainerd
  • Fix error handling for kill/process not found

With the contianerd 1.0 migration we now have strongly typed errors that
we can check for process not found.
We also had some bad error checks looking for ESRCH which would only
be returned from unix.Kill and never from containerd even though we
were checking containerd responses for it.

Fixes some race conditions around process handling and our error checks
that could lead to errors that propagate up to the user that should not.

Related to #35594

With the contianerd 1.0 migration we now have strongly typed errors that
we can check for process not found.
We also had some bad error checks looking for `ESRCH` which would only
be returned from `unix.Kill` and never from containerd even though we
were checking containerd responses for it.

Fixes some race conditions around process handling and our error checks
that could lead to errors that propagate up to the user that should not.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@crosbymichael
Copy link
Contributor

LGTM

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
@vieux
Copy link
Contributor

vieux commented Dec 16, 2017

LGTM

@thaJeztah
Copy link
Member

z failure looks unrelated;

01:53:24 ----------------------------------------------------------------------
01:53:24 FAIL: check_test.go:366: DockerSwarmSuite.TearDownTest
01:53:24 
01:53:24 check_test.go:371:
01:53:24     d.Stop(c)
01:53:24 daemon/daemon.go:395:
01:53:24     t.Fatalf("Error while stopping the daemon %s : %v", d.id, err)
01:53:24 ... Error: Error while stopping the daemon db005154c4e62 : exit status 130
01:53:24 
01:53:24 
01:53:24 ----------------------------------------------------------------------
01:53:24 PANIC: docker_api_swarm_service_test.go:201: DockerSwarmSuite.TestAPISwarmServicesUpdateStartFirst
01:53:24 
01:53:24 [db005154c4e62] waiting for daemon to start
01:53:24 [db005154c4e62] daemon started
01:53:24 
01:53:24 [db005154c4e62] daemon started
01:53:24 Attempt #2: daemon is still running with pid 12408
01:53:24 Attempt #3: daemon is still running with pid 12408
01:53:24 Attempt #4: daemon is still running with pid 12408
01:53:24 [db005154c4e62] exiting daemon
01:53:24 ... Panic: Fixture has panicked (see related PANIC)
01:53:24 
01:53:24 ----------------------------------------------------------------------

restarting for good measure

@jchauncey
Copy link

So this doesnt seem to have fixed my problem =\

@jchauncey
Copy link

nevermind this has definitely solved the problem i was having with jenkins and docker.

@thaJeztah
Copy link
Member

Thanks for testing @jchauncey 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants